{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CUDA科普课---带你走进GPU的并行世界"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.什么是CUDA？\n",
    "- CUDA  \n",
    "\n",
    "  Compute Unified Device Architecture\n",
    "\n",
    "- CUDA C/C++  \n",
    "\n",
    "  基于C/C++的编程方法\n",
    "  支持异构编程的扩展方法\n",
    "  简单明了的APIs，能够轻松的管理存储系统\n",
    "\n",
    "- CUDA支持的编程语言：  \n",
    "\n",
    "  C/C++/Python/Fortran/Java/…….\n",
    "  \n",
    "  \n",
    "![GPUvsCPU](image/GPUvsCPU.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.适用设备:\n",
    "- 所有包含NVIDIA GPU的服务器，工作站，个人电脑，嵌入式设备等电子设备\n",
    "- 软件安装:  \n",
    "\n",
    "    Windows：https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html\n",
    "\t只需安装一个.exe的可执行程序\n",
    "\n",
    "    Linux：https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html \n",
    "\t按照上面的教程，需要6 / 7 个步骤即可\n",
    "\n",
    "    Jetson： https://developer.nvidia.com/embedded/jetpack\n",
    "\t直接利用NVIDIA SDK Manager 或者 SD image进行刷机即可\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过下面的命令查看是否安装好CUDA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nvcc -V"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过下面的命令查看是否有支持CUDA的 **NVIDIA GPU**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.GPU的硬件结构\n",
    "下图所示的是GA100的硬件架构图，它包含了：\n",
    "- 8192 FP32 CUDA Cores（用于计算的核心）\n",
    "- 128个SM（SM指stream multiprocessor，即流多处理器，可以方便一块线程之间的协作）\n",
    "- 每个SM包含64个FP32 CUDA Core，4个第三代Tensor Core  \n",
    "\n",
    "\n",
    "- Device\n",
    "![hardware](image/hardware.png)\n",
    "\n",
    "- SM  \n",
    "![sm](image/sm.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.CUDA的线程层次\n",
    "在计算机科学中，执行线程是可由调度程序独立管理的最小程序指令序列。  \n",
    "在GPU中，可以从多个层次管理线程：\n",
    "- Thread: sequential execution unit\n",
    "    所有线程执行相同的核函数\n",
    "    并行执行\n",
    "- Thread Block: a group of threads\n",
    "    执行在一个Streaming Multiprocessor (SM)\n",
    "    同一个Block中的线程可以协作\n",
    "- Thread Grid: a collection of thread blocks\n",
    "    一个Grid当中的Block可以在多个SM中执行\n",
    "\n",
    "![thread](image/thread.png)\n",
    "\n",
    "\n",
    "\n",
    "   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.CUDA程序的编写\n",
    "- kernel函数的实现\n",
    "    需要在核函数之前加上 **@cuda.jit**标识符  \n",
    "```Python\n",
    "    @cuda.jit\n",
    "    def add_kernel(x, y, out):\n",
    "```\n",
    "- kernel函数的调用\n",
    "    需要添加执行设置  \n",
    "    **add_kernel[blocks_per_grid, threads_per_block](x, y, out)**  \n",
    "    这里的blocks_per_grid代表Grid中block在x,y,z三个维度的数量  \n",
    "    这里的threads_per_block代表Block中thread在x,y,z三个维度的数量  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.CUDA线程索引\n",
    "- 我们可以通过cuda.threadIdx，cuda.blockIdx，cuda.blockDim，cuda.gridDim来确定每个线程要处理的数据  \n",
    "\n",
    "![thread_index2](image/thread_index2_python.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7.实际编程 \n",
    "接下来我们来尝试编写第一个CUDA程序。我们来实现一个向量加法的例子，将两个包含1000000个元素的向量相加  \n",
    "当我们用CPU实现时：\n",
    "```Python\n",
    "def vecAdd (n, a, b, c)\n",
    "    for i in range(n):    \n",
    "        c[i] = a[i] + b[i];\n",
    "```\n",
    "当我们用GPU实现时：\n",
    "```Python\n",
    "def add_kernel(x, y, out):\n",
    "    tx = cuda.threadIdx.x # 当前线程在block中的索引值\n",
    "    ty = cuda.blockIdx.x  # 当前线程所在block在grid中的索引值\n",
    "\n",
    "    block_size = cuda.blockDim.x  # 每个block有多少个线程\n",
    "    grid_size = cuda.gridDim.x    # 每个grid有多少个线程块\n",
    "    \n",
    "    start = tx + ty * block_size\n",
    "    stride = block_size * grid_size\n",
    "    \n",
    "    for i in range(start, x.shape[0], stride):\n",
    "        out[i] = x[i] + y[i]\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行下面的代码，来完成向量相加的例子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numba import cuda, float32\n",
    "\n",
    "@cuda.jit\n",
    "def add_kernel(x, y, out):\n",
    "    tx = cuda.threadIdx.x \n",
    "    ty = cuda.blockIdx.x  \n",
    "\n",
    "    block_size = cuda.blockDim.x  \n",
    "    grid_size = cuda.gridDim.x    \n",
    "    \n",
    "    start = tx + ty * block_size\n",
    "    stride = block_size * grid_size\n",
    "    for i in range(start, x.shape[0], stride):\n",
    "        out[i] = x[i] + y[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "n = 100000\n",
    "x = np.arange(n).astype(np.float32)\n",
    "y = 2 * x\n",
    "out = np.empty_like(x)\n",
    "\n",
    "threads_per_block = 128\n",
    "blocks_per_grid = 30\n",
    "\n",
    "add_kernel[blocks_per_grid, threads_per_block](x, y, out)\n",
    "print(out[:20])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，我们看到在计算向量运算的时候，GPU比CPU有明显的速度优势  \n",
    "**虽然这个例子很简单，但是在我们的实际应用中却经常用到。  \n",
    "比如：我们在对拍好的照片进行美化的时候，需要将照片的亮度调整。那么此时，我们就需要对每一个像素点的数值进行增大或者缩小。如果我们把图片的所有像素值想象成我们上面处理的向量，利用CUDA就可以非常有效的进行加速**\n",
    "![Pixel.jpg](image/Pixel.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**如上图所示，我们只需要让每个线程来调整一个像素中数值即可调整整张图片的亮度和对比度**  \n",
    "**接下来执行下面的代码，完成调整图片亮度的例子：**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一步，完成CUDA核函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cv2\n",
    "import numba\n",
    "import time\n",
    "import math\n",
    "\n",
    "#GPU function\n",
    "@cuda.jit\n",
    "def process_gpu(img,rows,cols,channels):\n",
    "    tx = cuda.blockIdx.x*cuda.blockDim.x+cuda.threadIdx.x\n",
    "    ty = cuda.blockIdx.y*cuda.blockDim.y+cuda.threadIdx.y\n",
    "    if tx<rows and ty<cols:                             \n",
    "        for c in range(channels):\n",
    "            color = img[tx,ty][c]*2.0+30\n",
    "            if color>255:\n",
    "                img[tx,ty][c]=255\n",
    "            elif color<0:\n",
    "                img[tx,ty][c]=0\n",
    "            else:\n",
    "                img[tx,ty][c]=color"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第二步，实现CPU端处理的代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#cpu function\n",
    "def process_cpu(img):\n",
    "    rows,cols,channels=img.shape\n",
    "    for i in range(rows):\n",
    "        for j in range(cols):\n",
    "            for c in range(3):\n",
    "                color=img[i,j][c]*2.0+30\n",
    "                if color>255:\n",
    "                    img[i,j][c]=255\n",
    "                elif color<0:\n",
    "                    img[i,j][c]=0\n",
    "                else:\n",
    "                    img[i,j][c]=color"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第三步，定义main函数，利用opencv读取图片，并将它分别交给CPU和GPU进行处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def main_image_process():\n",
    "    #Create an image.\n",
    "    filename = 'test1.jpg'\n",
    "    img = cv2.imread(filename)\n",
    "    img2 = cv2.imread(filename)\n",
    "    img = cv2.resize(img,(1000,1000),interpolation = cv2.INTER_AREA)\n",
    "    img2 = cv2.resize(img,(1000,1000),interpolation = cv2.INTER_AREA)\n",
    "    rows,cols,channels=img.shape\n",
    "    dst_cpu = img.copy()\n",
    "    dst_gpu = img.copy()\n",
    "    start_cpu = time.time()\n",
    "    process_cpu(img)\n",
    "    end_cpu = time.time()\n",
    "    time_cpu = (end_cpu-start_cpu)\n",
    "    print(\"CPU process time: \"+str(time_cpu))\n",
    "    ##GPU function\n",
    "    threadsperblock = (16,16)\n",
    "    blockspergrid_x = int(math.ceil(rows/threadsperblock[0]))\n",
    "    blockspergrid_y = int(math.ceil(cols/threadsperblock[1]))\n",
    "    blockspergrid = (blockspergrid_x,blockspergrid_y)\n",
    "    start_gpu = time.time()\n",
    "    dImg = cuda.to_device(img2)\n",
    "    cuda.synchronize()\n",
    "    process_gpu[blockspergrid,threadsperblock](dImg,rows,cols,channels)\n",
    "    cuda.synchronize()\n",
    "    end_gpu = time.time()\n",
    "    dst_gpu = dImg.copy_to_host()\n",
    "    time_gpu = (end_gpu-start_gpu)\n",
    "    print(\"GPU process time: \"+str(time_gpu))\n",
    "    #save\n",
    "    cv2.imwrite(\"result_cpu.png\",img)\n",
    "    cv2.imwrite(\"result_gpu.png\",dst_gpu)\n",
    "    print(\"Done.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第四步，执行，得到处理结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_image_process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，查看[result_cpu.png](result_cpu.png)和[result_gpu.png](result_gpu.png)。我们很清楚的看到，完成同样的事情并得到相同的结果，CPU比GPU用了更多的时间"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8.矩阵相乘  \n",
    "完成上一个例子，您就已经能够进行简单的CUDA编程了。接下来我们体验一下难一点的例子，矩阵相乘。  \n",
    "矩阵操作在很多领域都有非常广泛的应用，比如在非常热门的卷积神经网络中的卷积操作，就可以利用矩阵乘来完成。接下来，我们就尝试利用CUDA来加速矩阵相乘的操作。  \n",
    "下面展示了如何利用CPU来完成矩阵相乘的操作    \n",
    "\n",
    "```Python\n",
    "def matmul_cpu(A,B,C):\n",
    "\tfor y in range(B.shape[1]):\n",
    "\t\tfor x in range(A.shape[0]):\n",
    "\t\t\ttmp = 0\n",
    "\t\t\tfor k in range(A.shape[1]):\n",
    "\t\t\t\ttmp += A[x,k]*B[k,y]\n",
    "\t\t\tC[x,y] = tmp\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，我们看到，CPU代码的逻辑是，逐一求出结果矩阵P中的每一个元素的值。   \n",
    "而利用CUDA来加速时，我们要申请与C矩阵中元素个数相等的线程，每个线程来处理求一个C矩阵中的元素值，最终并行执行得到结果矩阵。\n",
    "\n",
    "```Python\n",
    "@cuda.jit\n",
    "def matmul_gpu(A,B,C):\n",
    "\trow,col = cuda.grid(2)\n",
    "\tif row < C.shape[0] and col < C.shape[1]:\n",
    "\t\ttmp = 0.\n",
    "\t\tfor k in range(A.shape[1]):\n",
    "\t\t\ttmp += A[row,k]*B[k,col]\n",
    "\t\tC[row,col] = tmp\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**执行下面的代码，完成矩阵相乘的例子**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一步，实现CPU端代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def matmul_cpu(A,B,C):\n",
    "    for y in range(B.shape[1]):\n",
    "        for x in range(A.shape[0]):\n",
    "            tmp = 0\n",
    "            for k in range(A.shape[1]):\n",
    "                tmp += A[x,k]*B[k,y]\n",
    "            C[x,y] = tmp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第二步，实现CUDA核函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@cuda.jit\n",
    "def matmul_gpu(A,B,C):\n",
    "    row,col = cuda.grid(2)\n",
    "    if row < C.shape[0] and col < C.shape[1]:\n",
    "        tmp = 0.\n",
    "        for k in range(A.shape[1]):\n",
    "            tmp += A[row,k]*B[k,col]\n",
    "        C[row,col] = tmp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第三步，利用SM中的Shared memory来优化核函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TPB = 16\n",
    "@cuda.jit\n",
    "def matmul_shared_mem(A,B,C):\n",
    "    sA = cuda.shared.array(shape=(TPB,TPB), dtype=float32)\n",
    "    sB = cuda.shared.array(shape=(TPB,TPB), dtype=float32)\n",
    "\n",
    "    x,y = cuda.grid(2)\n",
    "\n",
    "    tx = cuda.threadIdx.x\n",
    "    ty = cuda.threadIdx.y\n",
    "    if x>=C.shape[0] or y >= C.shape[1]:\n",
    "        return\n",
    "    tmp = 0.\n",
    "    for i in range(int(A.shape[1]/TPB)):\n",
    "        sA[tx, ty] = A[x, ty+i*TPB]\n",
    "        sB[tx, ty] = B[tx+i*TPB, y]\n",
    "        cuda.syncthreads()\n",
    "        for j in range(TPB):\n",
    "            tmp += sA[tx,j]*sB[j,ty]\n",
    "    C[x,y] = tmp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第四步，定义main函数，在这部中，我们初始化A，B矩阵，并将数据传输给GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def main_matrix_mul():\n",
    "    TPB = 16\n",
    "    A = np.full((TPB*10,TPB*10), 3.0, np.float)\n",
    "    B = np.full((TPB*10,TPB*10), 4.0, np.float)\n",
    "    C_cpu = np.full((A.shape[0],B.shape[1]), 0, np.float)\n",
    "    \n",
    "    #Start in CPU\n",
    "    print(\"Start processing in CPU\")\n",
    "    start_cpu = time.time()\n",
    "    matmul_cpu(A,B,C_cpu)\n",
    "    end_cpu = time.time()\n",
    "    time_cpu = (end_cpu - start_cpu)\n",
    "    print(\"CPU time: \"+str(time_cpu))\n",
    "    \n",
    "    #Start in GPU\n",
    "    A_global_mem = cuda.to_device(A)\n",
    "    B_global_mem = cuda.to_device(B)\n",
    "    \n",
    "    C_global_mem = cuda.device_array((A.shape[0],B.shape[1]))\n",
    "    C_shared_mem = cuda.device_array((A.shape[0],B.shape[1]))\n",
    "    \n",
    "    threadsperblock = (TPB, TPB)\n",
    "    blockspergrid_x = int(math.ceil(A.shape[0]/threadsperblock[0]))\n",
    "    blockspergrid_y = int(math.ceil(A.shape[1]/threadsperblock[1]))\n",
    "    blockspergrid = (blockspergrid_x,blockspergrid_y)\n",
    "    \n",
    "    print(\"Start processing in GPU\")\n",
    "    start_gpu = time.time()\n",
    "    matmul_gpu[blockspergrid, threadsperblock](A_global_mem,B_global_mem,C_global_mem)\n",
    "    cuda.synchronize()\n",
    "    end_gpu = time.time()\n",
    "    time_gpu = (end_gpu - start_gpu)\n",
    "    print(\"GPU time(global memory):\"+str(time_gpu))\n",
    "    C_global_gpu = C_global_mem.copy_to_host()\n",
    "    \n",
    "    print(\"Start processing in GPU (shared memory)\")\n",
    "    start_gpu = time.time()\n",
    "    matmul_shared_mem[blockspergrid, threadsperblock](A_global_mem,B_global_mem,C_global_mem)\n",
    "    cuda.synchronize()\n",
    "    end_gpu = time.time()\n",
    "    time_gpu = (end_gpu - start_gpu)\n",
    "    print(\"GPU time(shared memory):\"+str(time_gpu))\n",
    "    C_shared_gpu = C_shared_mem.copy_to_host"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第五步，执行main函数，对比使用不同的方法来加速矩阵乘的速度差异"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_matrix_mul()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，我们可以看到，GPU比CPU有明显的速度优势。这样，我们如果在人工智能或者深度学习中利用GPU进行加速，就可以取得非常大的性能优势"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**接下来，我们再看一个利用卷积操作对图像进行模糊的例子**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请连续执行下面三个代码块"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@cuda.jit\n",
    "def convolve(result, mask, image):\n",
    "    i, j = cuda.grid(2) \n",
    "    image_rows, image_cols, channels = image.shape\n",
    "    if (i <= image_rows) and (j <= image_cols): \n",
    "        delta_rows = mask.shape[0] // 2 \n",
    "        delta_cols = mask.shape[1] // 2\n",
    "        for c in range(3):\n",
    "            result[i, j][c] = 0\n",
    "        for k in range(mask.shape[0]):\n",
    "            for l in range(mask.shape[1]):\n",
    "                i_k = i - k + delta_rows\n",
    "                j_l = j - l + delta_cols\n",
    "                if (i_k >= 0) and (i_k < image_rows) and (j_l >= 0) and (j_l < image_cols):\n",
    "                    for c in range(3):\n",
    "                        result[i, j][c] += mask[k, l] * image[i_k, j_l][c]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def main_convolve():\n",
    "    filename = 'test1.jpg'\n",
    "    img = cv2.imread(filename).astype(np.float32)\n",
    "    dImg = cuda.to_device(img)\n",
    "    result = cuda.to_device(img)\n",
    "    rows,cols,channels=img.shape\n",
    "    dst_gpu = img.copy()\n",
    "    \n",
    "    mask = np.random.rand(13, 13).astype(np.float32) \n",
    "    mask /= mask.sum() \n",
    "\n",
    "    ##GPU function\n",
    "    threadsperblock = (16,16)\n",
    "    blockspergrid_x = int(math.ceil(rows/threadsperblock[0]))\n",
    "    blockspergrid_y = int(math.ceil(cols/threadsperblock[1]))\n",
    "    blockspergrid = (blockspergrid_x,blockspergrid_y)\n",
    "    cuda.synchronize()\n",
    "    start_gpu = time.time()\n",
    "    convolve[blockspergrid,threadsperblock](result,mask,dImg)\n",
    "    cuda.synchronize()\n",
    "    end_gpu = time.time()\n",
    "    result_gpu = result.copy_to_host()\n",
    "    time_gpu = (end_gpu-start_gpu)\n",
    "    print(\"GPU process time: \"+str(time_gpu))\n",
    "    #save\n",
    "    cv2.imwrite(\"result_gpu_convolve.png\",result_gpu)\n",
    "    print(\"Done.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_convolve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时我们打开[result_gpu_convolve.png](result_gpu_convolve.png),可以看到利用GPU对[test1.jpg](test1.jpg)进行模糊处理过的结果   \n",
    "处理前   \n",
    "![test1.jpg](test1.jpg)  \n",
    "处理后   \n",
    "![result_gpu_convolve.png](result_gpu_convolve.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![performance](image/performance.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
